A New Approach for English-Chinese Named Entity Alignment
نویسندگان
چکیده
Traditional word alignment approaches cannot come up with satisfactory results for Named Entities. In this paper, we propose a novel approach using a maximum entropy model for named entity alignment. To ease the training of the maximum entropy model, bootstrapping is used to help supervised learning. Unlike previous work reported in the literature, our work conducts bilingual Named Entity alignment without word segmentation for Chinese and its performance is much better than that with word segmentation. When compared with IBM and HMM alignment models, experimental results show that our approach outperforms IBM Model 4 and HMM significantly.
منابع مشابه
Toward a Name Entity Aligned Bilingual Corpus
This paper describes a co-training framework in which, through named entity aligned bilingual text, named entity taggers can complement and improve each other via an iterative process. This co-training approach allows us to 1) apply our method to not only parallel but also comparable text, greatly extending the applicability of the approach; and to 2) adapt named entity taggers to new domains; ...
متن کاملEnglish-to-Chinese Machine Transliteration using Accessor Variety Features of Source Graphemes
This work presents a grapheme-based approach of English-to-Chinese (E2C) transliteration, which consists of many-to-many (M2M) alignment and conditional random fields (CRF) using accessor variety (AV) as an additional feature to approximate local context of source graphemes. Experiment results show that the AV of a given English named entity generally improves effectiveness of E2C transliteration.
متن کاملOn Jointly Recognizing and Aligning Bilingual Named Entities
We observe that (1) how a given named entity (NE) is translated (i.e., either semantically or phonetically) depends greatly on its associated entity type, and (2) entities within an aligned pair should share the same type. Also, (3) those initially detected NEs are anchors, whose information should be used to give certainty scores when selecting candidates. From this basis, an integrated model ...
متن کاملDiscovery of Unknown Events From Multi-lingual News
We have proposed a new approach to detect topically-related events from multi-lingual news sources. In particular, we are interested in Chinese and English on-line newswire stories. Three categories of named entities terms, namely, people names, geographical location names, and organization names, together with the story content terms constitute the basis for story representation. The named ent...
متن کاملSentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm
We present an e cient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is e...
متن کامل